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1 Hardware reuse at th e behavioral leve l 

Patrick Schaumont, Radim Cmar, Serge Vernalde, Marc Engels, Ivo Bolsens 

June 1999 Proceedings of the 36th ACM/IEEE conference on Design automation 
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I m p rovin g s uperscalar instruction dispatch and issue b v exploitin g dyn a m ic c od e 
seg uences 

Sriram Vajapeyam, Tulika Mitra 

May 1997 ACM SIGARCH Computer Architecture News , Proceedings of the 24th 

annual international symposium on Computer architecture, volume 25 issue 2 
Full text available* 1llpdf{1 76 MB) Additional Information: full citation, abstract , reference s, citings, inde x 
^ : terms 

Superscalar processors currently have the potential to fetch multiple basic blocks per cycle 
by employing one of several recently proposed instruction fetch mechanisms. However, this 
increased fetch bandwidth cannot be exploited unless pipeline stages further downstream 
correspondingly improve. In particular, register renaming a large number of instructions per 
cycle is difficult. A large instruction window, needed to receive multiple basic blocks per 
cycle, will slow down dependence resolution ... 

Mimic: a fast svstem/370 simulator 
C. May 

July 1987 ACM SIGPLAN Notices , Papers of the Symposium on Interpreters and 

interpretive techniques, volume 22 issue 7 
Full text available: ^ gpdf(L16 MB) Additional Information: full citation , abstract , citings , index terms 

Software simulation of one computer on another tends to be slow. Traditional simulators 
typically execute about 100 instructions on the host machine per instruction simulated. 
Newer simulators reduce the expansion factor to about 10, by saving and reusing 
translations of individual instructions. This paper describes an experimental simulator which 
takes the progression one step further, translating groups of instructions as a unit. This 
approach, combined with flow analysis, reduces the expansio ... 

Physical integrity in a large segmented database 
Raymond A. Lorie 

March 1977 ACM Transactions on Database Systems (TODS), volume 2 issue 1 

Full text available: f flpdf(1.12 MB) Additional Information: full citation, abstract , references , citings, index 

terms 

A database system can generally be divided into three major components. One component 
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supports the logical dataoase as seen by the user. Another comp^rent maps the information 



into physical records. The third component, called the storage component, is responsible for 
mapping these records onto auxiliary storage (generally disks) and controlling their transfer 
to and from main storage. This paper is primarily concerned with the implementation of a 
storage component. It considers ... 

Keywords: checkpoint-restart, database, recovery, storage management 



5 Active messages: a mechanism for integrated communication and computation 
Thorsten von Eicken, David E. Culler, Seth Copen Goldstein, Klaus Erik Schauser 

April 1992 ACM SIGARCH Computer Architecture News , Proceedings of the 19th 

annual international symposium on Computer architecture, volume 20 issue 2 

Full text available:^ pdf(1 40 MB). Additional Information: yLcitation, abstract, refere nce s, cit i n gs, index 
y^. _. terms 

The design challenge for large-scale multiprocessors is (1) to minimize communication 
overhead, (2) allow communication to overlap computation, and (3) coordinate the two 
without sacrificing processor cost/performance. We show that existing message passing 
multiprocessors have unnecessarily high communication costs. Research prototypes of 
message driven machines demonstrate low communication overhead, but poor processor 
cost/performance. We introduce a simple communication mechanism, 

6 Active m e ssa g es: a mechan i sm for in teg ratin g communication and comput atio n 
Thorsten von Eicken, David E. Culler, Seth Copen Goldstein, Klaus Erik Schauser 
August 1998 25 years of the international symposia on Computer architecture 

(selected papers) 

Full text available: ^ pdf(1.47 MB) Additional Information: full citation , references , index terms 



7 Empirical evaluation of the CRAY-T3D: a compiler perspective 

Remzi H. Arpaci, David E. Culler, Arvind Krishnamurthy, Steve G. Steinberg, Katherine Yelick 
May 1995 ACM SIGARCH Computer Architecture News , Proceedings of the 22nd 

annual international symposium on Computer architecture, volume 23 issue 2 
Full text available: fiQpdfd .48 MB ) Additional Information: full citation , abstract , references , citings , index 

terms 

Most recent MPP systems employ a fast microprocessor surrounded by a shell of 
communication and synchronization logic. The CRAY-T3D provides an elaborate shell to 
support global-memory access, prefetch, atomic operations, barriers, and block transfers. 
We provide a detailed empirical performance characterization of these primitives using 
micro-benchmarks and evaluate their utility in compiling for a parallel language. We have 
found that the raw performance of the machine is quite impressive and ... 

8 Sur passing the TLB performance of superpa q es with less operatin g s ystem sup port 
Madhusudhan Talluri, Mark D. Hill 

November 1994 Proceedings of the sixth international conference on Architectural 

support for programming languages and operating systems, volume 29 , 

28 Issue 11,5 

Full text available: fjQ pdfd.SO MB) Additional Information: full citation , abstract , references , citings , index 

terms 

Many commercial microprocessor architectures have added translation lookaside buffer 
(TLB) support for superpages. Superpages differ from segments because their size must be 
a power of two multiple of the base page size and they must be aligned in both virtual and 
physical address spaces. Very large superpages (e.g., 1MB) are clearly useful for mapping 
special structures, such as kernel data or frame buffers. This paper considers the 
architectural and opera ... 

9 Depth-order point classification techniques for CSG display algorithms 
Frederik W. Jansen 
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January 1991 ACM Transacnons on Graphics (TOG), volume 10 issue 



Full text available: 1 p|pdf(4.54 MB) Additional Information: full citation , abstract , references, citings, index 
^ terms , review 

Constructive Solid Geometry (CSG) defines objects as Boolean combinations (CSG trees) of 
primitive solids. To display such objects, one must classify points on the surfaces of the 
primitive solids with respect to the resulting composite object, to test whether these points 
lie on the boundary of the composite object or not. Although the point classification is trivial 
compared to the surface classification (i.e., the computation of the composite object), for 
CSG models with a large number o ... 

10 Reducin g instru c ti on c a che ene r gy consumptio n us ing a co m piler-based strate g y 
W. Zhang, J. S. Hu, V. Degalahal, M. Kandemir, N. Vijaykrishnan, M. J. Irwin 
March 2004 ACM Transactions on Architecture and Code Optimization (TACO), volume l 
Issue 1 

Full text available: ^p df(1.15 M B) Additional Information: f u ll citat i on , abstract, r eferenc es, in d e x term s 

Excessive power consumption is widely considered as a major impediment to designing 
future microprocessors. With the continued scaling down of threshold voltages, the power 
consumed due to leaky memory cells in on-chip caches will constitute a significant portion of 
the processor's power budget. This work focuses on reducing the leakage energy consumed 
in the instruction cache using a compiler-directed approach. We present and analyze two 
compiler-based strategies termed as conservative and optim ... 

Keywords: Leakage power, cache design, compiler optimizations 



Circuit considerations for low power: The microarchitecture of a low power register file Q 
Nam Sung Kim, Trevor Mudge 

August 2003 Proceedings of the 2003 international symposium on Low power 
electronics and design 

Full text available: ^ pdf(1 19.10 KB) Additional Information: full citation , abstract, references , index terms 

The access time, energy and area of the register file are often critical to overall performance 
in wide-issue microprocessors, because these terms grow superlinearly with the number of 
read and write ports that are required to support wide-issue. This paper presents two 
techniques to reduce the number of ports of a register file intended for a wide-issue 
microprocessor without hardly any impact on IPC. Our results show that it is possible to 
replace a register file with 16 read and 8 write ports ... 

Keywords: instruction level parallelism, low power, out-of-order processor, register file, 
write queue 



12 Energy efficiency in system design: Energy frugal tags in reprogrammable l-caches for Q 
a pplication-specific embedded processors 

Peter Petrov, Alex Orailoglu 

May 2002 Proceedings of the tenth international symposium on Hardware/software 
codesign 

Full text available: ^ pdf(649.65 KB) Additional Information: full citation , abstract , references , index terms 

In this paper we present a software-directed customization methodology for minimizing the 
energy dissipation in the instruction cache, one of the most power consuming 
microarchitectural components of high-end embedded processors. We target particularly the 
instruction cache tag operations and show how an exceedingly small number of tag bits, if 
any, are needed to compute the miss/hit behavior for the most frequently executed 
application loops, thus minimizing the energy needed to perform the tag ... 

13 Database issues for event-based middleware: MJoin: a metadata-aware stream join [jjjj 
operator 

Luping Ding, Elke A. Rundensteiner, George T. Heineman 
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June 2003 Proceedings ofme 2nd international workshop on [retributed event-based 
systems 

Full text available: ^ pdf(229.21 KB) Additional Information: full citation , abstract , references 

Join algorithms must be re-designed when processing stream data instead of persistently 
stored data. Data streams are potentially infinite and the query result is expected to be 
generated incrementally instead of once only. Data arrival patterns are often unpredictable 
and the statistics of the data and other relevant metadata often are only known at runtime. 
In some cases they are supplied interleaved with the actual data in the form of stream 
markers. Recently, stream join algorithms, like Sym ... 

Keywords: Metadata, XML Stream, XQuery Subscription, constraint, join algorithms, 
optimization 
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